Abstract: High dimensional data clustering is an emerging research field as; it is becoming a major challenge to cluster high dimensional data due to the high scattering of data points. Most of the traditional clustering methods are not that appropriate to handle high dimensional data. In this paper, a new algorithm “CLQ-CS” is proposed based on the subspace clustering algorithm called CLIQUE and the Cuckoo Search strategy. The proposed “CLQ-CS” algorithm consists of two phases. The data pre-processing is the first phase where CLIQUE is used for subspace relevance analysis to find the dense subspaces. In the second phase a global search strategy called cuckoo search is introduced to cluster the subspaces detected in the first phase. The problem of losing some of the regions that are actually densely populated due to the high scattering of data points in high-dimensional space can be overcome. The experiments performed on large and high dimensional synthetic and real world datasets demonstrate that CLQ-CS performs with a higher efficiency and better resulting cluster accuracy. Moreover, the proposed algorithm not only yields accurate results when the number of dimensions increases but also outperforms the individual algorithms when the size of the dataset increases.

Keywords: CLIQUE, Cuckoo Search, High Dimensional Data, Subspace Clustering, CLQ-CS.